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Abstract: 

We characterize the matrix sizes for which the Linpack Benchmark 1 .0 matrix generator constructs 
a matrix with identical columns. 

1 Introduction 

Since 1993, twice a year, a list of the sites operating the 500 most powerful computer systems is released by 
the TOP500 project [10]. A single number is used to rank computer systems based on the results obtained 
on the High Performance Linpack Benchmark (HPL Benchmark). 

The HPL Benchmark consists of solving a dense linear system in double precision, 64-bit floating point 
arithmetic, using Gaussian elimination with partial pivoting. The ground rules for running the benchmark 
state that the supplied matrix generator, which uses a pseudo-random number generator, must be used in 
running the HPL benchmark. The supplied matrix generator can be found in High Performance Linpack 
1.0 (HPL-1.0) [9] which is an implementation of the HPL Benchmark. In a HPL benchmark program, 
the correctness of the computed solution is established and the performance is reported in floating point 
operations per sec (flops/sec). It is this number that is used to rank computer systems across the world in the 
TOP500 list. For more on the history and motivation for the HPL Benchmark, see [2]. 

In May 2007, a large high performance computer manufacturer ran a twenty-hour-long HPL Benchmark. 
The run fails with the output result: 

II A x - b | |_oo / ( eps * | |A| |_1 * N ) = 9.224e+94 FAILED 

It turned out that the manufacturer chose n to be n = 2,220,032 = 2 13 • 271. This was a bad choice. In 
this case, the HPL Benchmark 1.0 matrix generator produced a matrix A with identical columns. Therefore 
the matrix used in the test was singular and one of the checks of correctness determined that there was a 
problem with the solution and the results should be considered questionable. The reason for the suspicious 
results was neither a hardware failure nor a software failure but a predictable numerical issue. 

Nick Higham pointed out that this numerical issue had already been detected in 1989 for the LINPACK- 
D benchmark implementation, a predecessor of HPL, and had been reported to the community by David 
Hough [6]. Another report has been made to the HPL developers in 2004 by David Bauer with n = 131,072. 
In this manuscript, we explain why and when the Linpack Benchmark 1.0 matrix generator generates ma- 
trices with identical columns. We define S as the set of all integers such that the Linpack Benchmark 1.0 
matrix generator produces a matrix with at least two identical columns. We characterize and give a simple 
algorithm to determine if a given n is in S. 

Definition 1 We define 5 as the set of all integers such that the Linpack Benchmark 1.0 matrix generator 
produces a matrix with at least two identical columns. For i > 2, we define Si as the set of all integers such 
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65,536 ( 2) 
180,224 ( 2) 
262,144 (32) 
303,104 ( 2) 
344,064 ( 3) 
385,024 ( 2) 
425,984(7) 
466,944 ( 2) 



98,304 ( 2) 
196,608 ( 6) 
270,336 ( 2) 
311,296(3) 
352,256 ( 2) 
393,216 (24) 
434,176(2) 
475,136(4) 



131,072(8) 
212,992 ( 2) 
278,528 ( 3) 
319,488 (2) 
360,448 ( 6) 
401,408 ( 2) 
442,368 ( 4) 
483,328 ( 2) 



147,456 ( 2) 
229,376 ( 4) 
286,720 ( 2) 
327,680 (10) 
368,640 ( 2) 
409,600 ( 4) 
450,560 ( 2) 
491,520 ( 8) 



163,840 ( 3) 
245,760 ( 2) 
294,912 ( 5) 
335,872 ( 2) 
376,832 ( 3) 
417,792 ( 2) 
458,752 (14) 
499,712 ( 2) 



Table 1: The 40 matrix sizes smaller than 500,000 for which the Linpack Benchmark 1.0 matrix generator 
will produce a matrix with identical columns. The number in parenthesis indicates the maximum of the 
number of times each column is repeated. For example, the entry "491,520 ( 8)" indicates that, for the 
matrix size 491,520, there exists one column that is repeated eight times while there exists no column that 
is repeated nine times. 

that the Linpack Benchmark 1.0 matrix generator produces a matrix with at least one column repeated i 
times. 



In Table 1 , for illustration, we give the 40 smallest integers in S along with the largest i for which the 
associated matrix size is in Si- 
Some remarks are in order. 

Remark 1.1 If i > j > 2 then Si C Sj C S. 

Remark 1.2 If n is in S, then the matrix generated by the Linpack Benchmark 1.0 matrix generator 
has at least two identical columns, therefore this matrix is necessarily singular. If n is not in S, the 
coefficient matrix has no identical columns; however we do not claim that the matrix is nonsingular. 
Not being in S is not a sufficient condition for being nonsingular. 

Remark 1.3 In practice, we would like the coefficient matrix to be well-conditioned (since we 
want to numerically solve a linear system of equations associated with them). This is a stronger 
condition than being nonsingular. Edelman in [3] proves that for real n-by-n matrices with elements 
from a standard normal distribution, the expected value of the log of the 2-norm condition number is 
asymptotic to logrc as n — > oo (roughly log«+ 1.537). The Linpack Benchmark 1.0 matrix generator 
uses a uniform distribution on the interval [-0.5, 0.5], for which the expected value of the log of the 2- 
norm condition number is also asymptotic to logrc as n — > oo (roughly 4 log n + 1), see Cuesta-Albertos 
and Wschebor [1]. Random matrices are expected to be well-conditioned; however, pseudo random 
number generator are only an attempt to create randomness and we will see that, in some particular 
cases, the generated matrices have repeated columns and are therefore singular (that is to say infinitely 
ill-conditioned). 

Remark 1.4 HPL-1.0 checks whether a zero-pivot occurs during the factorization and reports it to 
the user. Due to rounding errors, even if the initial matrix has two identical columns, exact-zero pivots 
hardly ever occur in practice. Consequently, it is difficult for benchmarkers to distinguish between 
numerical failures and hardware/software failures. This issue is further investigated in §5. 

Remark 1.5 In Remark 1.3, we stated that we would like the coefficient matrix to be well- 
conditioned. Curiously enough, we will see in §5that the HPL benchmark can successfully return 
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when ran on a matrix with several identical columns. This is due to the combined effect of finite 
precision arithmetic (that transforms a singular matrix into an ill-conditioned matrix) and the use of 
a test for correctness that is independent of the condition number of the coefficient matrix. 

2 How the Linpack Benchmark matrix generator constructs a pseudo- 
random matrix 

The pseudo-random coefficient matrix A from the HPL Benchmark 1.0 matrix generator is generated by the 
HPL subroutine HPL.pdmatgen . c. In this subroutine, the pseudo-random number generator uses a linear 
congruential algorithm (see for example [7, §3.2]) 

X(n + 1) = (a*X(n) +c) mod m, 

with m = 2 31 , a = 1103515245, c = 1235. These choices of m, a and c are fairly standard and we find 
them for example in the standard POSIX.1-2001 or in the GNU libc library for the rand ( ) function. The 
maximum period of a sequence generated by a linear congruential algorithm is at most m, and in our case, 
with HPL-l.O's parameters a and c, we indeed obtain the maximal period 2 31 . (Proof: either by direct check 
or using the Full-Period Theorem, see [7, §3.2]). This provides us with a periodic sequence s such that 
s(i + 2 31 ) = s(i), for any i £ N. HPL-1.0 fills its matrices with pseudo-random numbers by columns using 
this sequence s starting with A(l, 1) = s(l), A(2, 1) = s(2), A(3, 1) = s(3), and so on. 

Definition 2 We define a Linpack Benchmark 1.0 matrix generator, a matrix generator such that 

A(i,j) = s((j-l)*n + i), \<i,j<n. (1) 

and s is such that 

s(i + 2 3i ) = s(i), foranyieN and s(i)^s(j), for any 1 < i,j < 2 31 . (2) 
Some remarks: 

Remark 2.1 The assumption s(i) ^ s(j), for any 1 < i,j < 2 31 is true in the case of the Linpack 
Benchmark 1.0 matrix generator. It can be relaxed to admit more sequences s for which some elements 
can be identical. However this assumption makes the sufficiency proof of the theorem in §4 easier 
and clearer. 

Remark 2.2 It is important to note that the matrix generated by the Linpack Benchmark 1.0 matrix 
generator solely depends on the dimension n. The Linpack Benchmark 1.0 matrix generator requires 
benchmarkers to use the same matrix for any block size, for any number of processors or for any grid 
size. 

Remark 2.3 Moreover, since the Linpack Benchmark 1 .0 matrix generator possesses its own im- 
plementation of the pseudo-random number generator, the computed pseudo-random numbers in the 
sequence s depend weakly on the computer systems. Consequently the pivot pattern of the Gaussian 
elimination is preserved from one computer system to the other, from one year to the other. 

Remark 2.4 Finally, the linear congruential algorithm for the sequence s enables the matrix gener- 
ator for a scalable implementation of the construction of the matrix: each process can generate their 
local part of the global matrix without communicating or generating the global matrix. This property 
is not usual among pseudo-random number generators. 
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Remark 2.5 To give a sense of the magnitude of the size n of matrices, the matrix size for the #1 
entry in the TOP500 list of June 2008 was 2,236,927 which is between 2 21 and 2 22 . The smallest 
matrix size in the TOP 500 list of June 2008 was 273,919 which is between 2 18 and 2 19 . 

Remark 2.6 The pseudo-random number generator has been changed five times in the history of 
the Linpack Benchmark. We recall here some historical facts. 

1980 - LINPACKD-1.0 - The initial LINPACKD benchmark uses a matrix generator based on the 
(Fortran) code below: 

subroutine matgen ( n , a , Ida ) 
real a(lda,*) 
init = 1325 
do 10 j = l,n 
do 20 i = l,n 

init = mod( 3125*init, 65536 ) 
a(i,j) = (init - 32768 . ) /16384 . 
2 continue 
10 continue 
end 

The period of this pseudo-random number generator is: 2 14 = 16, 384. 

1989 - numerical failure report - David Hough [6] observed a numerical failure with the LINPACKD- 
1.0 benchmark for a matrix size n = 512 and submitted his problem as an open question to the 
community through NA-Digest. 

1989 - LINPACKD-2.0 - Two weeks after David Hough's post, Robert Schreiber [8] posted in NA 
Digest an explanation of the problem, he gave credit to Nick Higham and himself for the expla- 
nation. The problem #27.4 in Nick Higham's Accuracy and Stability of Numerical Algorithms 
book [5] is inspired from this story. Higham and Schreiber also provide a patch to improve the 
pseudo-random number generator. Replacing line 6 of the previous code 
init = mod(3125*init, 65536) 

by 

init = mod(3125*init-l, 65536) 

increases the period from 2 14 = 16,384 to 2 16 = 65,536. We call this version LINPACKD-2.0. 

1992 - LINPACKD-3.0 - The pseudo-random number generator of LINPACKD is updated for good 
in 1992 by using the DLARUV LAPACK routine based on Fishman's multiplicative congruen- 
tial method with modulus 2 48 and multiplier 33952834046453 (see [4]). 

2000 - HPL-1.0 - First release of HPL (09/09/2000). The pseudo-random number generator uses a 
linear congruential algorithm (see for example [7, §3.2]) 

X(n + 1) = (a*X(n) + c) mod m, 

with m = 2 31 , a = 1 1035 15245, c = 1235. The period of this pseudo-random number generator 
is 2 31 . 

2004 - numerical failure report - Gregory Bauer observed a numerical failure with HPL and n = 
2 17 = 131,072. History repeats itself. The HPL developers recommended to HPL users willing 
to test matrices of size larger than 2 15 to not use power two. 

2007 - numerical failure report - A large manufacturer observed a numerical failure with HPL and 
n = 2,220,032. History repeats itself again. Note that 2,200,032 = 2 13 • 271, and is not a power 
of two. 
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2008 - HPL-2.0 - This present manuscript explains the problem in the Linpack Benchmark 1.0 ma- 
trix generator. As of September 10th 2008, Piotr Luszczek has incorporated a new pseudo- 
random number generator in HPL-2.0. This pseudo-random number generator uses a linear 
congruential algorithm with a = 6364136223846793005, c = 11 and m = 2 64 . The period of 
this pseudo-random number generator is 2 64 . 

3 Understanding S 

Consider a large dense matrix of order 3 • 10 6 generated by the process described in Definition 2. The 
number of entries in this matrix is 9 • 10 12 which is above the pseudo-random number generator period 
(2 31 w 2.14 • 10 9 ). However, despite this fact, it is fairly likely for the constructed matrix to have distinct 
columns and even to be well-conditioned. 

On the other hand, we can easily generate a "small" matrix with identical columns. Take n=2 16 , we have 
for any i = 1, . . . ,n: 

A(i,2 15 + l)=s(i + n*(j-l)) = s(i + 2 l5 *n)=s(i + 2 l5 *2 16 )=s(i + 2 3l )=s(i) = A(z,l), 

therefore the column 1 and the column 2 15 + 1 are exactly the same. The column 2 and the column 2 15 +2 
are exactly the same, etc. We can actually prove that 2 = 65,536 is the smallest matrix order for which a 
multiple of a column can happen. 

Another example of n G S is n = 2 31 = 2, 147,483,648 for which all columns of the generated matrix 
are the same. Our goal in this section is to build more n in S to have a better knowledge of this set. 

If n is a multiple of 2° = 1 and n > 2 31 then n G J>. (Note that the statement "any n is multiple of 2° = 1 
and n > 2 31 " means n > 2 .) The reasoning is as follows. There are 2 31 indexes from 1 to 2 . Since there 
are at least 2 31 + 1 elements in the first row of A (assumption n > 2 31 ), then, necessarily, at least one index 
(say k) is repeated twice in the first row of A. This is the pigeonhole principle. Therefore we have proved 
the existence of two columns i and j such that they both start with the k-th term of the sequence. If two 
columns start with the index of the sequence, they are the same (since we take the element of the column 
sequentially in the sequence). The three smallest numbers of this type are 

n = 2° *(2 31 + 1) =2, 147,483,649 G S 
« = 2°*(2 31 +2) = 2, 147,483,650 G 5 
n = 2° *(2 31 + 3) =2,147,483,651 G S 

If n is a multiple of 2 1 =2 and n > 2 30 then n G J>. If n is even (n = 2q), then the first row of A accesses 
the numbers of the sequence s using only odd indexes. There are 2 30 odd indexes between 1 and 2 31 . Since 
there are at least 2 30 + 1 elements in the first row of A (assumption n > 2 30 ), then, necessarily, at least one 
index is repeated twice in the first row of A. This is the pigeonhole principle. The three smallest numbers of 
this type are: 

n = 2 1 *(2 29 + l) = 1,073,741, 826 G S 
« = 2 1 *(2 29 + 2) = 1,073,741, 828 G S 
n = 2 1 *(2 29 + 3) = 1,073,741,830 G S. 

If n is a multiple of 2 2 = 4 and n > 2 29 then n G J>. If n is a multiple of 4 (n = 4q), then the first row 
of A accesses the numbers of the sequence s using only (4q + 1 )-indexes. There are 2 29 (4q + 1 )-indexes 
between 1 and 2 31 . Since there are at least 2 29 + 1 elements in the first row of A (assumption n > 2 29 ), then, 
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necessarily, at least one index is repeated twice in the first row of A. This is the pigeonhole principle. The 
first three numbers of this type are: 

« = 2 2 *(2 27 + l) =536,870,916 G 5 
n = 2 2 * (2 27 + 2) = 536, 870, 920 G S 
?2 = 2 2 *(2 27 + 3) = 536,870,924 G 5. 

If n is a multiple of 2 13 and n > 2 18 then n G 5. This gives for example: 

rci2 = 2 13 *(2 5 + l) =2 13 *33 = 270,336 G 5 
ni3 = 2 13 *(2 5 +2) = 2 13 *34 = 278,528 G 5 
7215 = 2 13 *(2 5 +3) = 2 13 *35 = 294,912 G S- 

These three numbers correspond to entries (3,2), (3,3) and (3,5) in Table 1. 

If n is a multiple of 2 14 and n > 2 17 then n G 5. This gives for example: 

« 4 = 2 14 *(2 3 + 1) = 2 14 *9 = 147,456 G S 
n 5 = 2 14 *(2 3 + 2) = 2 14 *10 = 163,840 g5 
«6 = 2 14 *(2 3 + 3) = 2 14 * 11 = 180,224 G5. 

These three numbers correspond to entries (1,4), (1,5) and (2, 1) in Table 1. 

If n is a multiple of 2 15 and n > 2 16 then n G 5. This gives for example: 

« 2 = 2 15 *(2 1 + 1) =2 15 *3 = 98,304 G S 
m = 2 15 *(2 1 +2) =2 15 *4 = 131,072 G 5 
rc 5 = 2 15 *(2 1 +3) =2 15 *5 = 163,840 G S- 

These three numbers correspond to entries (1,2), (1,3) and (1,5) in Table 1. 

If n is a multiple of 2 16 and n > 2 15 then n G S. 

m = 2 16 *(2° + 1) =2 16 *1 = 65,536 G5 
n 3 = 2 16 *(2° + 2) =2 16 *2 = 131,072 g5 
n 7 = 2 16 *(2° + 3) =2 16 *3 = 196,608 G5. 

These three numbers correspond to entries (1,1), (1,3) and (2,2) in Table 1. 

From this section, we understand that any n multiple of 2 k and larger than 2 3l ~ k is in J>. In the next para- 
graph, we prove that this is indeed the only integers in S which provides us with a complete characterization 
of S. 



4 Characterization of S 

Theorem: n G S if and only if the matrix of size n generated by the Linpack Benchmark 1 .0 matrix generator 
has at least two identical columns if and only if 

n > 2 31 ~* where n = 2 k -q with q odd. 

Proof: 
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Let us assume that n is a multiple of 2 k , that is to say 

n = 2 k ■ q, 1 < q 

and let us assume that 

n>2 3l - k . 

In this case, the first row of A accesses the numbers of the sequence s using only (2 k • q + l)-indexes. 
There are 2 3X ~ k (2 k ■ q + l)-indexes between 1 and 2 31 . Since there are at least 2 3i ~ k + 1 elements 
in the first row of A (assumption n > 2 3l ~ k ), then, necessarily, at least one index is repeated twice in 
the first row of A. This is the pigeonhole principle. If two columns start with the same index in the 
sequence, they are the same (since we take the element of the column sequentially in the sequence). 

Assume that there are two identical columns i and j in the matrix generated by the Linpack Benchmark 
1.0 matrix generator (i ^ j). Without loss of generality, assume i > j. The fact that column i is the 
same as column j means that these columns have identical entries, in particular, they share the same 
first entry. We have 

A(V)=A(l,j). 

From this, Equation (1) implies 

s(l + (i-l)n)=s(l + (j-l)n). 

Equation (2) states that all elements in a period of length 2 31 are different, therefore, since i / j, we 
necessarily have 

i + (/-i>i = i + (j-i)?i+2 31 -p, i< P . 

This implies 

(i-j)n = 2 3i -p, \<p. 
We now use the fact that n = 2 k ■ q with q odd and get 

(i-j)-2 k -q = 2 31 -p, l<p, q is odd. 

Since q is odd, this last equality implies that 2 31 is a divisor of (i — j) ■ 2 k . This writes 

(i-j)-2* = 2 31 -r, l<r. 

From which, we deduce that 

(i-j>2*>2 31 . 

A upper bound for i is n, a lower bound for j is 1 ; therefore, 

{n-\)-2 k >2 31 . 

We conclude that, if a matrix of size n generated by the Linpack Benchmark 1.0 matrix generator has 
at least two identical columns, this implies 

n > 2 3l ~ k where n = 2 k ■ q with q odd. 

□ 
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5 Solving (exactly) singular system in finite precision arithmetic with a small 
backward error 



From our analysis, the first matrix size n for which the Linpack Benchmark 1.0 matrix generator will gener- 
ate a matrix with two identical columns is n = 65,536 (see Table 1). However, HPL-1.0 passes all the test 
for correctness on this matrix size. The same for n = 98, 304 which is our second matrix size in the list (see 
Table 1). If we look more carefully at the output file for n = 2, 220, 032, we see that only one out of the three 
test for correctness is triggered: 

I I Ax-b | | _oo / ( eps * | |A| |_1 * N ) = 9.224e+94 FAILED 

I I Ax-b | | _oo / ( eps * | |A| |_1 * ||x||_l ) = 0.0044958 PASSED 

I I Ax-b | | _oo / ( eps * | | A | | _o o * ||x||_oo ) = 0.0000002 PASSED 

Despite the fact that the matrix has identical columns, we observe that HPL-1.0 is enable to pass sometimes 
all the tests, sometimes two tests out of three, sometimes none of the three tests. This section will answer 
how this behavior is possible. First of all, we need to explain how the Linpack Benchmark assesses the 
correctness of an answer. 



5.1 How the Linpack Benchmark program checks a solution 

To verify the result after the LU factorization, the benchmark regenerates the input matrix and the right-hand 
side, then an accuracy check on the residual Ax — b is performed. 

The LINPACKD benchmark checks the accuracy of the solution by returning 

||Ajc — Z?||oo 



where \\A\\m = max,-, ; - |ay|. and £ is the relative machine precision. 
For HPL-1.0, the three following scaled residuals are computed 

||At-fc||oo 



ne 

\\Ax 



lilFlli 
1 1 Ax — ^ 1 1 oo 
m 



A solution is considered numerically correct when all of these quantities are less than a threshold value 
of 16. The last quantity ( ) corresponds to the normwise backward error in the infinite norm allowing 
perturbations on A only [5]. The last two quantities ( r„, r\ ) are independent of the condition number of 
the coefficient matrix A and should always be less than a threshold value of the order of 1 (no matter how 
ill-conditioned A is). 

As of HPL-2.0, the check for correctness is 

r4= \\Ax~b\U 

we ( ||A||oo ||x||oo + ||£>||oo) 

This corresponds to the normwise backward error in the infinite norm allowing perturbations on A and b 
only [5]. A solution is considered numerically correct when this quantity is less than a threshold value of 
16. Although the error analysis of Gaussian elimination with partial pivoting can be done in such a way that 
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b is not perturbed (in other words r„ is the criterion you want to use for Gaussian elimination with partial 
pivoting), HPL-2.0 switches to r$, the usual backward error as found in textbooks. 

This discussion on the check for correctness explains why HPL-1.0 is able to pass the test for correctness 
even though the input matrix is exactly singular. 

5.2 Repeating identical blocks to the underflow 

In [8], Schreiber and Higham explain what happens when a block is repeated k times in the initial coefficient 
matrix A. At each repeat, the magnitude of the pivot (diagonal entries of the U matrix) are divided by £. 
This is illustrated in Figure 1 . This process continues until underflow. Denormalized might help but the 
process is still the same and ultimately a zero pivot is reached, and the algorithm is stopped. In single 
precision arithmetic with e s = 2~ 24 and underflow 2~ 126 , five identical blocks will lead to underflow. In 
double precision arithmetic with £ = 2~ 16 and underflow 2~ 1022 , one will need 64 identical blocks. 




10 1 1 1 1 1 1 1 1 1 1 u 

50 100 150 200 250 300 350 400 450 500 

Figure 1 : Magnitude of the pivot (diagonal entries along the matrix U) for n = 512 = 2 9 and the LINPACK- 
2.0 matrix generator. The period of the LINPACK-2.0 matrix generator is n = 65536 = 2 16 so that, for 
a matrix of size n = 512, columns repeat every 128 column. We observe that pivots are multiplied by 
£ m 2.2 • 10~ 16 at every repetition. 



5.3 Anomalies in Matrix Sizes Reported in the June 2008 Top500 List 

Readers of this manuscript may be surprised to find three entries in the TOP 500 data from June 2008 with 
matrix sizes that lead to matrices with identical columns if the HPL test matrix generator is used. These 
three entries are given in Table 2. For example, the run for the Earth Simulator from 2002 was done with 
n = 1,075,200 which corresponds to 2 11 -525, therefore, the column j = 2 20 = 1,048,576 would have been 
a repeat of the first under our assumptions. The benchmark run on the Earth Simulator in 2002 was done 
with an older version of the test harness. This test harness predates the HPL test harness and uses another 
matrix generator than the one provided by HPL. Today we require the HPL test harness to be used in the 
benchmark run. 
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Rank Site Manufacturer Year NMax 

16 Information Technology Center, The University of Tokyo Hitachi 2008 1,433,600(6) 

49 The Earth Simulator Center NEC 2002 1,075,200(2) 

88 Cardiff University - ARCCA Bull SA 2008 634,880(2) 



Table 2: The three entries in the TOP500 June 2008 list with suspicious n. 

6 How to fix the problem 

Between 1 and 1 • 10 6 , there are 49 matrix sizes in S (see Table 1). Between 1 and 3 • 10 6 , there are 1,546 
matrix sizes in S (see Appendix B). Therefore, for this order of matrix size, there is a good chance to choose 
a matrix size that is not in S- Unfortunately benchmarkers tend to pick multiples of high power of 2 for their 
matrix sizes which increases the likelihood of picking an n G S- 

1 . The obvious recommendation is to choose any n as long as it is odd. In the odd case ifn<2 31 «4-10 9 , 
then n £ S- 

2. A check can be added at the beginning of the execution of the Linpack Benchmark matrix generator. 
The C-code looks as follows: 

long long Int m,n; 
int i,k,t,s; 

s = 31; 

m=n; k=0; while (m%2==0) {k++; m=m/2;} 
m=l; t=0; while (m<=n) {t++; m=m*2;} 
if (t+k>s) i = 1; else i = 0; 

n is the matrix size, 2 s is period of the pseudo-random number generator (s = 3 1 in our case) and i 
is the output flag. If i = 1, then n G S- If i = 0, then n £ S- (The check could also consist of looking 
over the data given in Appendix B). 

3. If n G S, one can simply pad the matrix with an extra line. This can be easily done in the HPL code 
HPL_pdmatgen . c by changing the variable jump3 from M to M+l whenever n G S- 

4. Another possibility is to increase the period of the pseudo-random number generator used. For ex- 
ample, if the pseudo-random number generator had a period of 2 64 and if n < 2 32 , then, assuming 
(i j => s(i) s(j)), entries would never repeat. 

5. A check for correctness robust to ill-conditioned matrix could be used as discussed in §5. 

The Problem with the Linpack Benchmark 1.0 matrix generator is now corrected in the Linpack Bench- 
mark 2.0 Matrix Generator. The fix includes both proposition 4 (extend the period of the pseudo random 
generator) and proposition 5 (have a test for correctness robust to ill-conditioned matrices). 
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A The 1,564 matrix sizes of n from 1 to 3,000,000 for which the Linpack 
Benchmark matrix generator will construct a matrix with identical columns 
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